## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.4 0.70 0.00 1.9 0.076
## 2 2 7.8 0.88 0.00 2.6 0.098
## 3 3 7.8 0.76 0.04 2.3 0.092
## 4 4 11.2 0.28 0.56 1.9 0.075
## 5 5 7.4 0.70 0.00 1.9 0.076
## 6 6 7.4 0.66 0.00 1.8 0.075
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 11 34 0.9978 3.51 0.56 9.4
## 2 25 67 0.9968 3.20 0.68 9.8
## 3 15 54 0.9970 3.26 0.65 9.8
## 4 17 60 0.9980 3.16 0.58 9.8
## 5 11 34 0.9978 3.51 0.56 9.4
## 6 13 40 0.9978 3.51 0.56 9.4
## quality
## 1 5
## 2 5
## 3 5
## 4 6
## 5 5
## 6 5
## [1] 1599 13
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1.0 Min. : 4.60 Min. :0.1200 Min. :0.000
## 1st Qu.: 400.5 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090
## Median : 800.0 Median : 7.90 Median :0.5200 Median :0.260
## Mean : 800.0 Mean : 8.32 Mean :0.5278 Mean :0.271
## 3rd Qu.:1199.5 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420
## Max. :1599.0 Max. :15.90 Max. :1.5800 Max. :1.000
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.900 Min. :0.01200 Min. : 1.00
## 1st Qu.: 1.900 1st Qu.:0.07000 1st Qu.: 7.00
## Median : 2.200 Median :0.07900 Median :14.00
## Mean : 2.539 Mean :0.08747 Mean :15.87
## 3rd Qu.: 2.600 3rd Qu.:0.09000 3rd Qu.:21.00
## Max. :15.500 Max. :0.61100 Max. :72.00
## total.sulfur.dioxide density pH sulphates
## Min. : 6.00 Min. :0.9901 Min. :2.740 Min. :0.3300
## 1st Qu.: 22.00 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500
## Median : 38.00 Median :0.9968 Median :3.310 Median :0.6200
## Mean : 46.47 Mean :0.9967 Mean :3.311 Mean :0.6581
## 3rd Qu.: 62.00 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300
## Max. :289.00 Max. :1.0037 Max. :4.010 Max. :2.0000
## alcohol quality
## Min. : 8.40 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.20 Median :6.000
## Mean :10.42 Mean :5.636
## 3rd Qu.:11.10 3rd Qu.:6.000
## Max. :14.90 Max. :8.000
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.636 6.000 8.000
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
This dataset has 1599 observations and 13 variables. These 1599 observations correspond to 1599 types of red wines.
Let’s begin with finding the correlation between each independent variable and the depedent variable.
abs(round(cor(wines),3))[-12,"quality"]
## X fixed.acidity volatile.acidity
## 0.066 0.124 0.391
## citric.acid residual.sugar chlorides
## 0.226 0.014 0.129
## free.sulfur.dioxide total.sulfur.dioxide density
## 0.051 0.185 0.175
## pH sulphates quality
## 0.058 0.251 1.000
Results seems to suggest we don’t none of the indepedent variables have strong correlation with the quality. So, we would need to work with mutiple independent variables to see if we get a stronger correlation with quality.
Not yet. Maybe will update this section, if I do create more variables.
We are going to turn the quality variable into a factor as this will help us make it a classfication problem.
wines$quality = as.factor(wines$quality)
summary(wines$quality)
## 3 4 5 6 7 8
## 10 53 681 638 199 18
table(wines$quality)
##
## 3 4 5 6 7 8
## 10 53 681 638 199 18
Variables that don’t change much with quality.. - fixed.acidity - resdiual.sugar - chlorides
Variables that decrease as the quality gets higher… - volatile.acidity - density - pH
Variables that increase as the quality gets higher - citric.acid - sulphates - alcohol
abs(round(cor(wines[,-c(12,13)]),3))
## X fixed.acidity volatile.acidity citric.acid
## X 1.000 0.268 0.009 0.154
## fixed.acidity 0.268 1.000 0.256 0.672
## volatile.acidity 0.009 0.256 1.000 0.552
## citric.acid 0.154 0.672 0.552 1.000
## residual.sugar 0.031 0.115 0.002 0.144
## chlorides 0.120 0.094 0.061 0.204
## free.sulfur.dioxide 0.090 0.154 0.011 0.061
## total.sulfur.dioxide 0.118 0.113 0.076 0.036
## density 0.368 0.668 0.022 0.365
## pH 0.136 0.683 0.235 0.542
## sulphates 0.125 0.183 0.261 0.313
## residual.sugar chlorides free.sulfur.dioxide
## X 0.031 0.120 0.090
## fixed.acidity 0.115 0.094 0.154
## volatile.acidity 0.002 0.061 0.011
## citric.acid 0.144 0.204 0.061
## residual.sugar 1.000 0.056 0.187
## chlorides 0.056 1.000 0.006
## free.sulfur.dioxide 0.187 0.006 1.000
## total.sulfur.dioxide 0.203 0.047 0.668
## density 0.355 0.201 0.022
## pH 0.086 0.265 0.070
## sulphates 0.006 0.371 0.052
## total.sulfur.dioxide density pH sulphates
## X 0.118 0.368 0.136 0.125
## fixed.acidity 0.113 0.668 0.683 0.183
## volatile.acidity 0.076 0.022 0.235 0.261
## citric.acid 0.036 0.365 0.542 0.313
## residual.sugar 0.203 0.355 0.086 0.006
## chlorides 0.047 0.201 0.265 0.371
## free.sulfur.dioxide 0.668 0.022 0.070 0.052
## total.sulfur.dioxide 1.000 0.071 0.066 0.043
## density 0.071 1.000 0.342 0.149
## pH 0.066 0.342 1.000 0.197
## sulphates 0.043 0.149 0.197 1.000
The strongest relationship is between pH and fixed.acidity(0.683)
Below variables, together seems to have a interesting relationships for distingushing between higher quality and lower quality wines - alcohol & chlorides - alcohol & volatile.acidity - alcohol & sulphates - sulphates & volatile.acidity
No, didn’t see any worth mentioning.
A very high number of wines are of the the quality 5 or 6(> 4/5ths).
As the wine quality increases, the median value of variables sulphates, alcohol & citric.acid increase and the median value of variables volatile.acidity, density & pH decrease.
Two variables combinations gives us a little bit of insight into the differenc between wines with higher and lower quality.
We started with trying to find out variables, individually or in combination, that influence the quality of the wine. We conclude that no single variable can by it’s own predict the quality of the wine. We would need to use mutiple variables and do more analysis, proabbly with more data.